Automated Assignment of Medical Subject Headings
نویسندگان
چکیده
Methods. A test collection of 200 MEDLINE citations published in 1997, with abstracts in English, were selected at random. The following methods of finding and ranking suitable MeSH descriptors have been investigated using this test collection: The Inquery Algorithm. This algorithm depends on parsing text into noun phrases, then using the Inquery search engine to match to MeSH descriptors. Cooccurring MeSH descriptors in the UMLS are used to suggest additional headings. MetaMap. MetaMap develops an ordered list of UMLS Metathesaurus concepts for each citation, based on the noun phrases extracted from that text. A ranked list of concepts is developed for each phrase. Trigram Algorithm. A phrase is broken into overlapping trigrams (three letters occurring in succession) for analysis. Candidate phrases are obtained from the title and abstract by examining all maximal contiguous sets of words that contain no punctuation or stop words (from a list of 310 common stop words). The trigrams are used to match phrases in the UMLS, with the maximal overlap of sets of trigrams resulting in the suggested UMLS concept. Restricting to MeSH. Once a UMLS concept has been identified, using the MetaMap method or the Trigram algorithm, the task becomes one of navigating within the UMLS to find the appropriate MeSH heading. This method was described previously. Related Articles Method. This method depends on the assumption that the semantic neighbors of a document are those documents in the database that are the most similar to it. The similarity between documents is measured by the words they have in common, with some adjustment for document lengths. The test document is used as the basis for finding similar documents. MeSH descriptors assigned to similar documents are then assigned to the test document. Clustering and Weighting of Suggested Headings. After using one or more of the above methods, the suggested MeSH headings are clustered. Descriptors close together in the same MeSH trees are given additional weight, as are the descriptors known to cooccur with high frequency in MEDLINE. The suggested headings from each method being tested are then presented in rank order.
منابع مشابه
Ranking Medical Subject Headings using a factor graph model
Automatically assigning MeSH (Medical Subject Headings) to articles is an active research topic. Recent work demonstrated the feasibility of improving the existing automated Medical Text Indexer (MTI) system, developed at the National Library of Medicine (NLM). Encouraged by this work, we propose a novel data-driven approach that uses semantic distances in the MeSH ontology for automated MeSH a...
متن کاملTHE SDSS DAMPED Ly α SURVEY : DATA RELEASE 1
We present the results from an automated search for damped Lyα (DLA) systems in the quasar spectra of Data Release 1 from the Sloan Digital Sky Survey (SDSS-DR1). At z ≈ 2.5, this homogeneous dataset has greater statistical significance than the previous two decades of research. We derive a statistical sample of 71 damped Lyα systems (> 50 previously unpublished) at z > 2.1 and measure H I colu...
متن کاملMeSH Up: effective MeSH text classification for improved document retrieval
MOTIVATION Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small...
متن کاملگرایش موضوعی پایان نامه های دانشکده مدیریَت و اطلاع رسانی پزشکی (سال تحصیلی 1380-1386)
Introduction: Thesis commonly reflects student's research interests, which are formed in the university education courses. Formation problem in thesis is one of the most important subjects in these research documents. Limitations and situations govern in research scope causes author (researcher) to limited framework of topic as problem base in his or her research. Investigation of thesis conten...
متن کاملStructured abstracts in MEDLINE, 1989-1991.
OBJECTIVE To characterize the structured abstracts in biomedical journals indexed in MEDLINE over a three-year period as an initial step in exploring their utility in enhancing bibliographic retrieval. DESIGN The study examined the occurrence of structured abstracts in MEDLINE from March 1989 to December 1991, characteristics of MEDLINE records for articles with structured abstracts, editoria...
متن کاملApplication of a Medical Text Indexer to an Online Dermatology Atlas
Clinical dermatology cases are presented as images and semi-structured text describing skin lesions and their relationships to disease. Metadata assignment to such cases is hampered by lack of a standardized dermatology vocabulary and facilitated methods for indexing legacy collections. In this pilot study descriptive clinical text from Dermatlas, a Web-based repository of dermatology cases, wa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999